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Abstract 

This study examines the predictability of GRE reading item difficulty 
(equated delta) for three major reading item types: main idea, inference aad 
explicit statement items. Each item type is analyzed separately, using 110 
GRE reading passages and their associated 244 reading items; selective 
analyses of 285 SAT reading items are also presented. Stepw5.se regression 
analyses indicates that the percentage of GRE delta variance accounted for 
varied from 20% to 52% depending upon the item type. 

Details of item predictability were explored by evaluating several 
hypotheses. Results indicated that (1) multiple-choice reading items are 
sensitive to variables similar to those reported in the experimental 
literature on comprehension, (2) many of these variables provide independent 
predictive information in regression analyses, and (3) substantial agreement 
between GRE and SAT reading predictability was found. 
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Introduction 



Purnose of Cur r ent Study 

The primary purpose of the current study is to predict reading item 
difficulty for each of three GRE reading item types- -main ideas, inferences 
and explicit statenent items- -which together constitute about 75% of the 
reading items. To achieve this goal we need to identify a set of variables 
that earlier studies suggest should be predictive of comprehension difficulty. 
By obtaining confirming evidence that these earlier identified variables are 
in fact predictive of GRE reading comprehension item difficulty, this can be 
taken as evidence favoring the claim that the GRE reading section is in fact a 
measure of passage comprehension. Such an outcome might lead to modifications 
in statements made recently by Royer (1990) as well as by Katz , 
Lautenschlager , Blackburn, and Harris (1990), who have argued that multiple- 
choice reading tests are primarily tests of reasoning rather than passage 
comprehension per se- -these arguments are presented in greater detail below. 

Background Studies 

Only a few studies appear to have focused on predicting item difficulty 
using items from standardized ability tests (Drum, Calfee, & Cook, 1981; 
Embretson & Wetzel, 1987). While not specifically focused on predicting 
reading item difficulty, man other studies of language processing have 
isolated a wide variety of variables that influence comprehension difficulty 
with respect to decision time and recall measures. A few such studies of 
particular interest here ate the study of negations by Carpenter and Just 
(1975), the study of rhetorical structure (Grimes, 1975) and its effect on 
accuracy of prose recall (Meyer, 1975; Meyer & Freedle, 1984) and prose 
comprehension (Hare, Rabinowitz, & Schieble, 1989); the use of referential 
expressions in constructing meaning (Clark & Haviland, 1977) , and the use of 
syntactic "frontings" (see details below) that appear to guide the 
interpretations of semantic relationships within and across paragraphs (see 
Freedle, Fine, & Fellbaum, 1981). The particular manner in which these 
selected variables will be employed will become evident later in this report. 
Using this set of hypothetically relevant variables, the primary strategy 
employed in this work has been to try to capture the large- and small-scale 
structures of the reading passages, and their associated items, in order to 
best account for observed reading item difficulty in a multiple - choice testing 
context . 

First we review those studies that predict reading item difficulty for 
multiple-choice tests. 




Drum, Calfee, and Cook (1981) predicted item difficulty using various 
surface structure variables and word frequency measures for the text, and 
several item variables that also depended on surface structure character- 
istics (e.g., number of words in the stem and options, number of words with 
more than one syllable). They reported good predictability using these simple 
surface variables; on average, they indicated that about ^70% of the variance 
of multiple-choice reading item difficulty was explained. 

Embretson and Wetzel (1987) also studied the predictability of 75 reading 
item difficulties using a few of the surface variables studied by Drum et al. 
(1981). But in addition, because of the brevity of their passages, Embretson 
and Wetzel were able co do a prepositional analysis (see Kintsch & van Dijk, 
1978) and add variables from this analysis, along with several other measures, 
as predictor variables. In particular they found that connective propositions 
were significant predictors. We believe that Meyer's (1975) top-level 
rhetorical structures, which we include in the present study, indirectly 
assess the presence of connectives (such as and , but, however . since . because . 
etc.) since each of the rhetorical devices differently emphasizes these 
connectives. For example, a top-level causal structure tends to use 
connectives such as since and because. A list structure tends to use 
connectives such as and and then , while a comparative structure will often 
employ connectives such as however, or yet . etc. 

Now we revif.w those additional studies that deal with variables 
that have been found to influence reading comprehension difficulty. Most of 
these additional variables were investigated in empirical studies that did 
not use multiple-choice methods to yield an index of comprehension difficulty. 
Instead many used dependent measures such as recall of passages or decision 
time to infer the influence that certain variables have on comprehension 
difficulty. This review along with our earlier review of the Drum et al . 
(1981) and Embretson & Wetzel (1987) studies will help us to select a final 
set of variables that we postulate may also index comprehension difficulty 
within a multiple-choice testing format. 

Carpenter and Just (1975) found that the occurrence of sentence ne gation 
increases comprehension decision time. This suggests that the number of 
negations contained in GRE reading passages may also influence multiple-choice 
item difficulty. Furthermore, one can inquire whether additional negations 
that are used in the ittm structure itself (either in the item stem or among 
the response options) m£iy als" separately contribute to comprehension 
difficulty over and above the contribution of text negations. 
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Abrahamsen and Shelton (1989) demonstrated improved comprehension of 
texts that were modified, in part, so that full noun phrases were substituted 
in place of referential expressions. This suggests that texts with many 
referential expressions may be more difficult than ones with few referential 
expressions. Again, for purposes of studying more broadly the effect of 
number of referential expressions on comprehension difficulty of multiple - 
choice tests, a separate count is also made of referential expressions that 
occur in the item proper. 

Hare et al. (1989) studied, in part, the effect of four of Grimes' 
(1975) rhetorical organizers on difficulty of identifying the main idea of 
passages- -students either wrote out the main idea if it was not explicitly 
stated or underlined it if it was explicitly stated. They found a significant 
effect of rhetorical organization such that list type structures (see 
definitions and examples below) facilitated main idea identification whereas 
some nonlist organizers made main idea information more difficult to locate. 
Meyer and Freedle (1984) examined the effect of the Grimes organizers on the 
ability of students to recall passages that contained the same semantic 
information except for their top level rhetorical organization. They found, 
like Hare et al., that list structures facilitated recall (for older 
subjects). However, they also reported that university students were best 
helped by comparative type organizations; this latter finding whs not 
replicated by Hare et al. 

It seems likely that rhetorical organization will contribute to 
comprehension difficulty within a multiple -choice testing format; however, it 
is not clear, given the differences between the Meyer and Freedle (1984) and 
Hare et al . (1989) studies, whether we can say in advance which type structure 
will be found to facilitate performance. Top-level rhetorical structure 
meaningfully applies only to the text structure; a comparable entry for items 
is not feasible. 

Freedle, Fine, and Fellbaum (1981) report differences in the use of 
"fronted" structures at sentence beginnings (and paragraph beginnings) as a 
function of the judged quality of student essays. Fronted structures included 
the following: (1) cleft structures ("It is true that she found the dog," 
where the initial "it" is a d\.unmy variable having no referent); (2) marked 
topics consisting of several subtypes (a) opening prepositional phrases or 
adverbials ("In the dark, all is uncertain"; "Quickly, near the lodge, the 
boat overturned") or (b) initial subordinate clauses ("IJhenever the car 
stalled, John would sweat"); and (3) combinations of coordinators and marked 
topics or cleft structures that begin independent clauses ("But, briefly, this 
didn't stop him"; "And, furthermore, it seems that is all one should say"). 

ERIC ^ 
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Freedle et al . (1981) showed that these different fronting structures 
significantly discriminate among essay quality such that the better essays 
contained a higher mean frequency of each of these fronted structures even 
after partialling out the effect of different lengths of essay as a function 
of ability level. They interpreted these fronted structures as authors' 
explicit markers for guiding readers to uncover the relationships that exist 
among independent clauses. If is not immediately clear whether differential 
use of all such structures would itself facilitate or inhibit comprehens?.on of 
GRE passages . If we assume that the structures produced by the more able 
writers are structures that are more difficult to learn, then one can predict 
that the more frequently these fronted structures occur, the more difficult 
the text should be to understand. In support of this, Clark and Haviland 
(1977) suggest that at least cleft structures may be harder to understand than 
simple declarative sentences. Also, Bever and Townsend (1979) found that when 
main clauses follow a subordinate clause, such sentences are more difficult to 
process than when main clauses occur in initial sentence positions (this 
overlaps somewhat with frontings, since initial subordinate clauses would 
count as one type of fronting). by including a count of all such variables, 
we can explicitly test the relevance of clefts and other fronted structures 
for how they might affect comprehension difficulty in a multiple-choice 
testing context. This is done separately for text as well as item content. 

While Kieras (1985) specifically focused on the perception of main idea 
information in reading, his study will be seen as potentially relevant for all 
three item types treated in this the current study. First we summarize 
Kieras' earlier work and then generalize it to include inference and explicit 
statement items . 

Kieras (1985) examined, in part, how students perceived the relative 
location of main idea information in short paragraphs. He found, using single 
paragraph passages extracted from technical manuals, that most students-, 
perceived main idea information as located early in the paragraph and a few 
thought the main idea occurred at or near the end of the paragraph; 
information in the middle of the paragraph was least often perceived as a 
statement of the main idea. Kieras did not report the relative frequencies 
with which the actual main ideas occurred among the passages so it is 
difficult to determine whether students tended to select the opening 
.sentences of passages as containing the main idea because most of the passages 
placed the key idea in this place or whether the students were simply 
reflecting a response bias to choose the opening sentences. Unless the main 
idea was equally represented by its location across the stimulus passages, the 
Kieras results are ambiguous. 

• Li) 
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However, the work of Hare et al . (1989) helps to clarify this issue. In 
one of their studies they systematically varied the known location of a main 
idea sentence in three locations: the opening sentence, the medial sentence, 
or the final sentence of a paragraph. The experimental subjects underlined 
which sentence they thought was the main idea sentence. Correct 
identifications were greatest for initial occurrence of main idea sentences. 
One can infer from the Hare et al . results that two tendencies contribute to 
main idea correctness: opening sentences that do contain the main idea tend to 
be selected partly because of a prior bias to select early sentences, but also 
because students are attempting to understand the information in the text 
sentences . 

One can generalize the Hare et al. (1989) work and the Kieras' (1985) 
findings to demonstrate the possible relevance of locational effects 
concerning how students respond to multiple -choice items for multipara graph 
passages. If students tend to perceive early text information, especially 
information in the opening sentences of the first paragraph, as main idea 
information (see Kintsch's, 1974, earlier work in this regard), then when 
certain passages actually confirm this search strategy, such items should be 
easier than those that disconfirm it (where disconf irming main idea 
information would be information that occurs in the middle of a multi- 
paragraph text; it is disconf irming only because it fails to conform to the 
expectation that main idea information "should" be near the beginning of a 
passagn) . So, the relative ordering of difficulty should be; opening 
sentences that fit the main idea information as stated in the correct answer 
to a main idea item will be easiest (other things being equal) , while main 
idea information that occurs near the middle of •. text will be associated with 
the hardest main idea items. As we have already suggested, such a result 
might be expected based on the earlier text processing theory of Kintsch. 

Since we also intend to study inference as well as explicit statement 
items, we might inquire whether the Ki-gras (1985) and Hare et al (1989) type 
findings about relative location of information in the passage for main ideas 
will also help account for item difficulty associated with these other two 
reading item types. 

Explicit statement items are of the following type; "According to the 
passage, x occurs when...." It seems reasonable to expect thnt if the 
relevant explicit information occurs early in the passage, the item shoiild 
tend to be easy. But if the relevant explicit information is located near the 
middle of the passage, the item should be more difficult. If so, this 
generalizes our interpretation of Kieras' (1985) results for main ideas to 
explicit statement inform, ition . We hypothesize that the surface location of 

ERIC ' - 
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relevant information influences the results. While for main ideas one 
normally expects early text information to contain the relevant main idea, 
there is no corresponding expectation for explicit statement information. 
Nevertheless, the beginning of a passage may be especially salient even for 
explicit statement items, not because a prior expectation is confirmed or 
disconfirmed but, more simply, because examinees may start their search for 
explicit information at the beginning of the passage. 

A similar argument can be made for inference type item.= . Inference items 
usually have thf following format: "One can infer from the passage that x 
means.:.." If the relevant text information needed to carry out the inference 
is located near the beginning of the passage, this might facilitate getting 
the item correct. But if the relevant text information is in the middle, the 
item will probably be more difficult. 

Other variables that one can hypothesize will be of importance in 
affecting comprehension difficulty for multiple - choice tests are vocabulary 
level (Graves, 1986), various measures of sentence complexity, such as 
se, tence length (Klare , 1974-1975), paragraph length (Hites, 1950), number of 
paragraphs (Freedle, Fine, & Fellbaum, 1981) and abstractness of text (Paivio, 
1986). In particular, less frequently occurring words and longer sentence 
structures tend to make texts more difficult to understand, as can be inferred 
from their use in traditional readability formulas (Graves, 1986); in 
addition, longer paragraphs, and abstractness of texts also make passages more 
difficult to comprehend (see Hites, 1950, and Paivio, 1986, respectively). 
Use of more paragraphs was found to be positively correlated with the quality 
of written essays (Freedle, Fine, & Fellbaum, 1981); it remains to be seen 
whether number of paragraphs itself contributes to reading comprehension 
difficulty in a multiple-choice testing context. 

It will be useful to collect this review of variables expected to 
influence reading comprehension item difficulty into a single set. Hence one 
can hypothesize that many of the variables listed that are known to contribute 
to comprehension difficulty in non-multiple - choice testing formats (or to 
quality judgments of written essays) will be found to significantly affect 
comprehension measures as determined within a multiple-choice testing format. 
Stating this more succinctly we have: 

Hypothesis 1 . The following variables are expected to significantly 
influence reading item difficulty as determined within a multiple-choice 
testing format: 
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a. negations 

b. referentials 

c. rhetorical organizers 

d. fronted structures 

1. cleft structures 

2. marked topics 

3. combinations (of coordinators and marked topicr. or coordinators 
with cleft structures) 

e . vocabulary 

f. sentence length 

g. paragraph length 

h. number of paragraphs 

i. abstractness of text 

j . location of relevant text information 



Relevance of hypothesis 1 to criticisms of multiple -choice reading tests 
as tests of passage comprehension . Hypothesis 1 can be viewed as an important 
hypothesis, particularly as it applies to the coding of the passage, as we 
shall endeavor to explain below. Royer (1990) indicates that "There is 
evidence that standardized reading comprehension tests that utilize multiple- 
choice questioiis do not measure the comprehension of a given passage. Instead 
they seem to measure a reader's world knowledge and his or her ability to 
reason and think about the contents of a passage" (p. 162). Royer then cites 
work by Tuinman (1973-1974), Drum et al . (1981) and Johnston (1984) to bolster 
this claim. Tuinman' s work is similar to the findings of Katz et al . (1990) 
wherein multiple -choice reading items are correctly responded to above chance 
levels in the absence of the reading passage. Of course Katz et al . also show 
that a significant increase in correct responses occurs when passages are 
available to a control group. Hence it seems that Royer appears to have 
overgeneralized the importance of just item structure in concluding that 
multiple-choice reading tests do not measure passage comprehension. That is, 
if multiple-choice tests of reading did not tap passage comprehension and were 
solely a reflection of world knowledge and reasoning ability, then the 
subsequent addition of the passage should have had no noticeable effect on 
reading item correctness. Since Katz et al . clearly show a significant 
augmentation of item correctness when the passage was available, one must 
conclude that multiple-choice reading tests do measure passage comprehension 
and simultaneously tap other abilities, such as reasoning. 

Royer's (1990) citation of Drum et al. (1981) also concerns the claimed 
importance of just item structure to reading comprehension item correctness. 
Incorrect option plausiblity was the most important predictor in Drum et al . ' s 
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study. They classified this as an item variable. However we claim that 
incorrect option plausibility is more accurately classified as a text by 
item interaction , and is not just an item variable. That is, in order to 
decide whether an incorrect option is a plausible answer or not, one 
necessarily must scan not only the item information but the text information 
as well. Hence Drum et al ' s best predictor is one that necessarily implicates 
the reading of the text. This leads us to conclude that Royer's acceptance of 
Drum et al.'s classification scheme led him to use their results (incorrectly, 
we feel) to support further his hypothesis that text comprehension does not 
play a crucial irole in multipl- choice reading tests. 

But suppose Royer's (1990) critique of multiple -choice tests is assumed 
to be correct. Then there is little reason to expect that the 10 variables 
listed under hypothesis 1 (a through j. above at least as they apply to the 
coding of the text) will be significantly related to multiple-choice reading 
test item difficulty. This should follow because, by (Royer's) hypothesis, 
multiple-choice tests are not tests of comprehension; thus variables, known to 
be related to comprehension difficulty (in the experimental literature), 
should not correlate with performance on multiple-choice reading comprehension 
tests. However, if Royer is incorrect, there is good reason to suppose that 
most if not all of the ten variables listed under hypothesis 1, at least as 
applied to the coding of the text, will be found to significantly correlate 
with reading item difficulty as obtained from multiple-choice testing. 

If supporting evidence is found for hypothesis 1, there is a second 
implication that is important to evaluate. There are few studies that 
assess the simultaneous influence of many variables on comprehension (Goodman, 
1982). Furthermore, many of the text materials that are evaluated in the 
experimental literature are not naturalistic texts but rather are' 
artificially constructed to test the effect of one or two variables (see Hare 
et al, 1989, for a related argument). With the current GRE passages that are 
selected from naturalistic texts, it should be possible to evaluate via 
regression analyses /hether the ten categories of variables of hypothesis 1 
contribute independent information in accounting for reading comprehension 
item difficulty. This leads us to our second hypothesis. 

Hypothesis 2 . Many of the 10 categories of variables provide 
independent predictive information in accounting for reading item difficulty. 

Corollary to hypothesis 2 . Confirmation of hypothesis 2, using GRE 
data, implies that many of the nine categories of variables for hypothesis 1 
apply to naturalistic texts as well as to the more controlled texts employed 
in any experimental studies of reading comprehension. 
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Materials and Method 

The 244 reading comprehension items taken from 28 GRE forms comprise the 
total item sample. The total number of reading passages represented was 110. 
Each test form consists of four reading passages, two long (400 words or more) 
and two short (200 words or less). Only main idea (n-76) , inference (n-87) , 
and explicit statement (n-81) items were selected for study. Other item types 
such as author's tone and author's organization occur infrequently and were 
not scored because subanalyses of each item type were planned. We also did 
not sample items that use a Roman numeral type format (e.g., where different 
combinations of three elements comprise the list of options as in (a) only I 
is correct, (b) only I and II are correct, (c) I and III are correct, (d) II 
and III are correct, (e) none are correct). We also excluded special items 
that feature a capitalized NOT or LEAST in the item stem. 

The data for each item difficulty measure (equated delta) was based on 
approximately 1,800 examinees; these examinees were randomly selected from a 
much larger pool of examinees who responded to each GRE test form. The 
equated delta value slightly adjusts the difficulty of each item across forms 
so that items can be meaningfully compared across forms. The adjustment stems 
from the fact that the sample of examinees who respond to a particular test 
form differ slightly in overall ability level from those responding to other 
test forms. The delta of each test form is adjusted so it has a mean of 
13.0 and a standard deviation of 4.0. 

The analysis of each item type was restricted in the following way. Each 
passage that was selected for a particular analysis was associated with a 
single exemplar of each of the three reading item types. Thus, for 76 main 
idea items there were 76 passages; for 87 inference items there were 87 
passages; for 81 explicit statement items there were 81 passages. Sometimes 
the f;ame passage was a source for several of the item types, and sometimes 
only one item was associated with a particular passage. For all analyses 
reported below, only single item types were involved. These restrictions in 
sampling were undertaken to avoid what statisticians call a "nesting" effect. 

Most of the independet t variables listed below were motivated by the 
literature review presented above. These, along with a few additional 
variables (e.g., number of rhetorical questions in the passage, type of 
passage subject matter, lexical coherence across text paragraphs), had been 
used in an earlier study of SAT reading item difficulty using a multiple- 
choice testing format (Freedle 6c Kostin, 1990). 
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Independent Variables for Representing Text and Item Information 

Item variables 

Item type 

vl- - Main idea 

v2 - -I nference 

v3- - Explicit statement 

Variables for item's stem 

v4- - Words in Stem : Number of words in stem (the item question) 
v5- - Hedges in Stem : Use of hedge (e.g., perhaps . probably) in stem 
v6- - Fragment Stem : Use of full question, or, sentence fragment 
v7- - Negative Stem : Use of simple negation 

v8- - Fronted Stem : Use of fronting (e.g., use of any phrases or 

clauses preceding the subject of the main independent clause, or use of 

clefts- -see below under text variables for details) 

v9- - Reference Stem : Sum of referentials to text, stem, or options 

(see below for definitions under text variables) 

vlO- - Reference Line Stem : Reference made to text lines or paragraphs 
Variables for item's correct option 

vll- - Answer Position : Ordinal position of correct answer 

vl2- - Words Correct : Number of words in correct option 

vl3- -Ne gative Correct : Use of simple negation(s) in correct option 

vl4- - Fronting Correct : Use of fronting(s) in correct option 

vl5- - Reference Correct : Use of referential(s) in correct option 

Variables for item's incorrect options 

vl6- - Words Incorrects : Number of words summed over all incorrect options 
vl7- -Ne gative Incorrects : Use of simple negation(s) summed over incorrect 
options 

vl8- - Fronted Incorrects : Use of fronting(s) summed over incorrect options 
vl9- - Reference Incorrects : Use of ref erential(s) summed over incorrect 
options 

Text Variables 

Vocabulary variable for text 

v?.0- - Vocabulary : Number of words with three or more syllables for 
the first 100 words of the passage (estimates vocabulary -difficulty) 

Er|c lo 
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Concreteness/abstractness of text 

v21- - Concreteness : Determines whether main idea of text and its 
development are concerned with concrete or abstract entities. 

Subject matter variables of text 
v22- - Physics 
v23- - Biology 

v24- - Natural science : Combined v22 and v23 into a single natural 
science variable 

v25-- Social Science : Subjects such as history, anthropology, 
economics, sociology, political science 

v26- - Humanities : Subjects such music, architecture, literary 
criticism, philosophy 

v27- - Natural science excerpt : Represents an "excerpt of natural 
science" 

v28- - About natural science : Represents a passage "about natural 
science" 

Type of rhetorical organization 

v29 - - Argument : Rhetorical presentation (i.e., author favors one of 
several points of view presented in text; occasionally other 
viewpoints may be only implied) 

v30- - List/Describe : Grimes' (1975) rhetorical organizer that 
interrelates a collection of elements in a text that are related 
in some unspecified manner; a basis of a list "... ranges from a 
group of attributes of the same character, event, or idea, to a 
group related by simultaneity to a group related by time sequence" 
(Meyer, 1985, p. 270). Describe relates a topic to more 
information about it. We felt this was sufficiently similar to 
list to warrant scoring them as members of the same category. 

v31 - - Cause : Another Grimes (1975) rhetorical organizer. "Causation 
shows a causal relationship between ideas where one idea is the 
antecedent or cause and the other is a consequent or effect. The 
relation is often referred to as the condition, result, or purpose 
with one argument serving as the antecedent and the other as the 
consequent. The arguments are before and after in time and causally 
related." (Meyer, 1985, p. 271). 

v32 - -Compare : Another Grimes (1975) rhetorical organizer. The 
comparison relation points out differences and similarities between 
two or more topics . 
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The two subtypes of compare used here are as follows: 

v33- - Compare-adversative (this relates a favored view to a less 
desirable opposing view) and 

v34- - Compare -alternative (this interrelates equally weighted 
alternative options or equally weighted opposing views) (Meyer, 
1985, p. 273). 

v35- - Problem/solution : This is defined as follows: "... similar to 
causation in that the problem is before in time and an antecedent 
for the solution. However, in addition there must be some overlap 
in topic content between the problem and solution; that is, at least 
part of the solution must match one cause of the problem. The . . . 
problem and solution . . . are equally weighted and occur at the same 
level in the content structure" (Meyer, 1985, p. 272). 

Coherence of lexical concepts over whole text 

v36- -Coherence (this involves judging whether opening concepts 

of the first sentence occur throughout the text paragraphs: 

3 - maximum lexical coherence, ... 0 = no obvious lexical overlap). 

Lengths of various text segments 

v37- - Paragrap hs : Number of passage paragraphs 
v38-- Text words : Number of words in passage 
v39-- Text sentences : Number of text sentences 

v40- - First paragraph words : Number of words in first paragraph 
v41- - Longest paragraph words : Number of words in longest paragraph 
v42- - First paragraph sentences : Number of sentences in first 
paragraph 

v43- - Longest paragraph sentences : Number of sentences in longest 
paragraph 

v44-- Text sentence words : Average number of words per text sentence 
v45-- Text paragraph words : Average number of words per paragraph 
v46-- First paragraph sentence length : Average length of sentences in 
first paragraph 

v47- - Longest jparagraph sentence length : Average length of sentences 
in longest paragraph 

Occurence of different text "frontings" : v48-v54 distinguish several 
types and combinations of "frontings." Some examples follow. 



ERIC 



■1 



13 



Use of theme -marking: In the front , the car rocked. 

Fortunately . the car rocked. 
Use of coordination: But , the car rocked. 

Use of clefts (deferred foci): It is the case that George is short. 

There are cases that defy reason. 
(It and there function as dummy elements without a referent.) 
Use of combinations: And, near the rear , the toy fell. 
Longest run of frontirgs: Number of successive independent clauses 
that begin with fronted information (e.g., "The man laughed. Then , 
he frowned. And when he turned , fell." This example of three 
independent clauses has two successive sentences with fronted 
material; hence its run length is "2".). 

v48- - Percent fronted paragraph openings : Percentage of fronted 

clauses in the opening clauses across all paragraphs 

v49-- Frequency fronted paragraph openings : Frequency of fronted 

clauses in the opening clauses across all paragraphs 

v50- - Percent fronted text clauses 

v51- - Frequency fronted text clauses 

v52-- Frequency combinations of fronted text structures 

v53-- Frequency of text clefts : this is sometimes referred to as 

deferred foci which is one type of fronting) 

v54- - Longest fronted run : number of consecutively fronted text 
clauses 

Text questions 

v55-- Text questions : Number of rhetorical questions in text 

Text referentials 

v56 - - Reference within text clauses : Frequency of within-clause 
referentials of all text clauses (e.g., "When George fell, he 
hurt. ") 

v57 - - Reference across text clauses : Frequency of across clause 
referentials (e.g., "George fell. That hurt.") 
v58- - Frequency special reference : Reference ouuside text (e.g., 
" One might feel sorry for George.") 
v59- - Reference Sums : Sum of v56 , v57, v58 

Text negations 

v60-- Text negatives : Number of simple negations in text 
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Special text by item interactions : the location of text information 
relevant to answering a particular item correctly. (Note: many item 
stems and/or item options specify a specific content to be searched for 
in the text [e.g., "according to the text, when the author said x, this 
means ...."]: scoring where in the text this linkage of critical stem 
information occurs has been designated as a text by item interaction 
variable . ) 

Text bv item interactions applicable only to main idea information 

v61 to v69--in general these variables specify location of main 
idea information at various places in the surface text. 

v61-- Main idea first sentence : Main idea information is in first 
sentence of text 

v62-- Main idea second sentence : Main idea information is in second 
sentence of text 

v63-- Main idea first short paragraph : Main idea information is in 
first short paragraph (100 words or less, excluding instances of 
■ v61 & v62) 

v64-- Main idea opening second paragraph : Main idea information in 
first sentence of second paragraph 

v65-- Main idea middle text : Main idea information is near middle of 
passage 

v66-- Main idea final short paragraph : Main idea information is in 
last short paragraph (100 words or less in paragraph, excluding 
instances of v41) 

v67-- Main idea last text sentence : Main idea information is in last 
sentence of tpxt 

v68-- Main idea no specific location : Main idea information is not 
located in any specific part of the text 

v69-- All early main idea locations : Sum of v61, v62, and v63. 
Several of the analyses below used only this combined category- - 
that is, v69-v61+v62+v63- -since this was found to improve 
predictability of some of the criterion variables in our earlier 
reading study (Freedle & Kostin, 1990). 

Text by item interactions applicable to inferences and explicit 
statement items 



v70- - Easily found word same sentence : Stem sends you to unique 
easily found word in text and relevant information is in t'hat same 
sentence. (Easily found means that word stands out from text 

er|c ^0 
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because it is in caps or quotes or involves special letters [as in 
"CS103"] .) 

v71- - Easily found word next sentence : Stem sends you to unlqUe 
easily found word in text but relevant information is in next 
sentence . 

v72- - Unique word same sentence : Stem sends you to unique word in 
text (but it is not easily discriminated from rest of text) and 
relevant information is in same sentence. 

v73- - Unique word next sentence : Same as v72 except relevant 

information is in next text sentence. 
v74- - Unique word previous sentence : Same as v72 except relevant 
information is in previous text sentence. 

v75- - Unique word later sentence : Same as v72 except relevant 
information is later in same paragraph, not the next sentence. 
v76- - Unique word earlier sentence : Same as v72 except relevant 
information is much earlier in same paragraph. 

v77- - Unique word different paragraph : Same as v72 except relevant 
information is in a different paragraph. 

v78-- Key word multiple places : Stem suggests a particular topic but 
that topic is mentioned more than once. 

v79- - Information in first sentence : Relevant information is in first 
text sentence 

v80- - Information in second sentence : Relevant information is in 
second text sentence 

v81- - Information in first short paragraph : Relevant information is 
in first short paragraph of 100 words or less (but not in first two 
sentences) . 

v8Z- - Information in opening second paragraph : Relevant information 
is in first sentence, second paragraph 

v83- - Information in last sentence : Relevant information is in last 
sentence of text. 

v84- - Information last short paragraph : Relevant information is 
earlier as in a last short paragraph of 100 words or less (but is 
not in last sentence) . 

v85- - Information middle of text : None of v79 to v84, but is relevant 
information located more in middle of text. 

v86- - Information from two paragraphs : Relevant information must be 
integrated from two text paragraphs . 

v87-- No directive stem information : No information in stem leads you 
to a specific place in text (e.g., stem reads "According to the 
passage which of the following statements is true: ..."). 
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v88- - Words before critical information : Number of words in passage 
you have to read before the sentence containing the relevant 
information begins. 

v89- - Words in relevant paragraph : Number of words in paragraph in 
which the relevant information is located. 

v90- - Information middle relevant paragraph : Relevant information is 
in middle of a paragraph rather than the first or last sentence of 
that paragraph. 

v91-- Sum early information codes : that is, v79 + v80 + v81. 
Depenaent variable 

v92-- Item difficulty : Item equated delta (referred to as just 
"delta") 

The dependent variable is an item's equated delta (an item's difficulty 
that converts percent corrects per test form to a common scale with mean of 
13.0 and standard deviation of 4). See above for a more detailed description 
of equated delta. 

In scoring items, the structure and content of item stems, correct 
options, and incorrect options were recorded usin^ the 19 variables listed 
above (3 of these 19 being the code for item type). A related set of 
variables was scored for capturing the passage information but included 
additional variables that were unique to the text structure (see variables 
listed above) . In all there are 39 text variables that apply to each of the 
three item types. Also there are 9 text by item variables for main idea 
items; for inference as well as explicit statement items there are 22 text by 
item variables . 
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Results and Discussion 

Table 1 presents data that help to identity those variables that will be 
important in predicting reading item difficulty. In Table 1 we see that 42 
different variables yield a significant correlation with item difficulty 
(equated delta) . First we will use portions of Table 1 to assess the apparent 
adequacy of hypothesis 1 for each of the 10 categories listed under the 
hypothesis. We are primarily interested here in how well the text and text 
related variables satisfy hypo- 
thesis 1 (because of the Royer, 1990, critique of multiple-choice tests of 
reading comprehension) ; significant effects of these categories for item 
variables, however, will also be pointed out. Also, we do not predict that 
each reading item type should necessarily reveal a significant correlation 
with each of the 10 categories listed under hypothesis 1; we do expect, 
however., to find some evidence, pooled over all three reading item types, that 
establishes the argument that a multiple-choice format yields findings similar 
to those reported in the experimental literature where other response formats 
(such as recall) have generally been employed. 

Correlates of Reading Item Difficulty as Determined by the Categories of 
Hypothesis 1 a. As expected, text negations (v60--text negatives) do 
significantly influence comprehension difficulty. Main idea items 
significantly correlate with text negations in the expected direction- - the 
more text negations the harder the main idea item. Howev^-r, for the item 
variables we see that inference items correlate significantly with the item's 
correct option negations (vl3- -negative correct) - -but the negative sign 
obtained for correct option negations is in the opposite direction of that 
expected. Since, for inference items, the same negative sign occurs for the 
item's stem negations (v7 - -negative stem), it might be that a matching 
operation across the item's stem and its correct option is, in part, 
accounting for this facilitative effect of negations; this will have to be 
studied in more detail elsewhere. These latter unexpected findings with 
respect to the sign of the correlation coefficient for negatives is the only 
place in this study where the findings contradict the expected directional 
prediction based on the literature review . So, regarding negations, only the 
text negations relate as expected to reading difficulty while the item 
negations correlate significantly but in the opposite direction of that 
expected. Hence negations cannot be counted as either confirming nor 
disconf irming one of the 10 categories under hypothesis 1. 

b. As predicted, the number of text referentials is significantly 
related to reading difficulty, here for main ideas, for variables v57, v58 and 
vIj9- -reference across text clauses, frequency special reference, and re., rence 
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Table 1 

Correlations of Significant Item and Text Variables with 
Equated Delta for Three GRE Reading Ttem Types 

Significant Correlation of Delta with 
Three Reading Item Types 
(n-76) (n-87) (n-81) 

Variable Main Idea Inference Explicit 



v7 Negative stem 

vl2 Words correct 

vl3 Negative correct 

vl4 Fronting correct 

vl5 Reference correct 

vl6 Words incorrects 

vl9 Reference incorrects 

v21 Concreteness 

v24 Natural science 

v26 Humanities 

v27 Natural science excerpt 

v29 Argument 

v34 Compare-alternative 

v36 Coherence 

v41 Longest paragraph words 

v44 Text sentence words 

v46 First paragraph sentence length 

v47 Longest paragraph sentence length 

v50 Percent fronted text clauses 

v51 Frequency fronr.«>d text clauses 

v52 Frequency combinations of fronted 

text structures 

v53 Frequency of text clefts 

v54 Longest fronted run 

v55 Text questions 

v57 Reference across text clauses 

v58 Frequency special reference 

v59 Reference sums 

v60 Text negatives 

v61 Main idea first sentencc 
v65 Main idea middle text 
v69 All early main idea locations 
v70 Easily found word same sentence 
v73 Unique word next sentence 
v77 Unique word different paragraph 
v79 Information in first sentence 
v80 Infoi.mation in second sentence 
v82 Information in opening second 

paragraph 
v83 Information in last sentence 
v85 Information middle of text 
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Table 1 (Continued) 

v89 Words in relevant paragraph NA -- .20* 

v90 Information middle relevant NA .39*** .28*** 

paragraph 

v91 Sum early information codes NA -- -.29*** 

a 

A positive correlation for delta means the variable makes the items 
harder. *** - signif. at p < .01, 2-tail; ** signif. at p <.05, 
2-tail; * p <.06, 2-tail; ++ p <.05, 1-tail; + p< .06, 1-tail. 
NA - not applicable. If a variable was not significant for the 2 -tall 
test but appeared as one of the variables listed under hypothesis 1 
where direction was predicted, we applied a 1-tail test. Also if a main 
idea variable was not significant at the 2-tail test and it was 
significant for our earlier SAT main idea data (Freedle St Kostin, 1990) , 
we again applied a l-tail test. 
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sums, respectively. Also for explicit statement items we get a significant 
relationship for variables vl5 (reference correct) and vl9 (reference 
incorrects) which also involve referentials . In general the more text 
referentials present (v57, v58 , v59--reference across text clauses, frequency 
special reference, reference sums, respectively), the harder the comprehension 
process is for main ideas. The number of referentials in the item structure 
itself (vl5- -reference Correct; vl9- -reference incorrects) influences explicit 
statement item difficulty by making such items more difficult. 

c. In line with our general prediction, we see that at least one of the 
text's rhetorical organizers (v34- -compare-alternative) is significantly 
correlated with explicit statement item difficulty: the compare -alternative 
organizer (v34) makes the item easier. V29 (argument) for main idea items can 
also be counted as among the text's rhetorical organizers. 

d. As generally predicted, the number of fronted structures in the text 
as measured by variables v50, v51, v52 , v53 and v54 (percent fronted text 
clauses; frequency fronted text clauses; frequency combinations of fronted 
text structures; frequency of text cleits, longest fronted run) make main idea 
items harder. V50 and v51 (percent fronted text clauses and frequency fronted 
text clauses) both deal with the sum of all singly fronted types (clefts and 
marked topics) while v52 (frequency combinations of fronted text structures) 
deals with combinations of fronted types such as clefts combined with 
coordinations. Also, we note that for the item's correct option, vl4 
(fronting correct) makes the item harder for both inference and explicit 
statement items. 

e. Vocabulary (v20) did not show a significant effect contrary to our 
hypothesis . 

f . We see that a 1- tailed test suggests that main ideas become more 
difficult the longer the sentences in the text are (v44--text sentence words- - 
and v47--longest paragraph sentence length). The text variable v46 (first 
paragraph sentence length) also contributes to item difficulty for explicit 
statement items. 

g. A text's longest paragraph length (v41- -longest paragraph words) 
shows a marginally significant 1-tail test for main idea items in the expec^ d 
direction. Also, later in this report, we demonstrate that for long passages 
(greater than 400 words) there is a significant relationship (p < .01, 2-tail) 
of a text 's longest paragraph length with main idea difficulty. 
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.h. Ntimber of paragraphs (v37 - -paragraphs) did not have a significant 
correlation with our three reading item types. 

i. As predicted, the concreteness (v21) of the text showed 
a significant effect; it is significant for both inference and explicit 
statement item types. (Concreteness of text makes these item types easier.) 

j , As predicted the following location variables are significantly 
correlated in the expected direction with reading difficulty: v61 , v65 , v69 , 
v79, v80, v82, v83, v85, v90 and v91 (main idea first sentence, main idea 
middle text, all early main idea locations. Information in first sentence for 
explicits and inferences, information in second sentence for explicits, 
information in opening second paragraph for inferences, information in last 
sentence for inferences, information in middle of text for explicits and 
inferences, information in middle relevant paragraph for explicits and 
inferences, and sum early infcrmation c.^des for explicits). 

Excluding negations, 7 of the 10 categories listed under hypothesis 1 
generally show the expected significant relationship of the text, item, and/or 
text by item interaction variables with one or more of the reading item types. 
Thus these results taken together appear to confirm, for multiple -choice 
testing formats, that the variables reported as important for comprehension in 
the empirical literature are also important contributors affecting 
comprehension difficulty in a multiple -choice testing format. 

Concerning only the W/or text-related variables , there were also 

7 significant categories. This particular result suggests that a multiple - 
rhnice fo rmat does not interfere w ith assessing passage comprehension and 
yields re.-^ ul r--; similar to those found in th e experimental literature; hence 
these particular text correlational results call into question some of Royer's 
(1990) criticisms of multiple -choice tests of reading. 

We next take up each item type and explore in greater det-. 'l the 
variables already meni ioned above plus additional variables that were not 
included among our set of categories for hypothesis 1. In the interest of 
brevity we restrict our comments to just those 27 variables that yielded 
correlations with significance levels of p <.05, 2-tail, or better. 

Correlational Results for Main Idea Items . The more negations in the passage 
(v60--text negatives) the harder it is for examinees to determine the main 
idea. Main idea response options are almost always stated in affirmative 
terms. Thus, this result suggests that an extra step may be involved in 
restating relevant text segments so as to agree with the positively stated 
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item alternatives. This extra step introduces the possibility of error and 
hence makes such items more difficult. 

If a passage is highly coherent (v36- -coherence) , it is easier to get 
the main idea item correct. A highly coherent text repeats one or more of the 
lexical concepts of the opening sentence throughout all subsequent paragraphs 
of the passage. Also, we see that if the main idea is located in the middle 
of the passage (v65--main idea middle text), it makes the main idea item 
harder . 

All remaining significant correlations (p <.05, 2 -tail) for main idea 
items deal with the use of text referentials (v58- -frequency special 
reference- -and v59- -reference sums) and text frontings (v50, v51, v52--percent 
fronted text clauses, frequency fronted text clauses, frequency combinations 
of fronted text structures), all of which make main idea items more difficult. 
If many "special" referential pronouns are included in the text (such as the 
indefinite "one can see that ..." or the indefinite "we know that...."), this 
appears to make it more difficult to clearly identify what the main idea of 
the passage is. Similarly, the sum of all pronominal referentials scored 
(within clause and across clause pronouns, including special pronouns) also 
appears to make main idea items harder (of course, this could be due solely to 
the inclusion of special pronouns in this summed score- -later in the 
regression analyses we shall be able to partial out the redundancy from these 
several scores) . The various text frontings also make main idea items harder- 
- these scores reflect qualifying phrases and so forth that occur prior to the 
main sentence subject. Apparently this qualifying information makes the 
examinee less certain about what the key idea is . 

Correlational Results for Inference Items . As Table 1 also shows for a 2-tail 
test, there are three variables that contribute to making inference items more 
difficult: vl6 (words incorrects), v85 (information middle of text), v90 
(information middle relevant paragraph). As the number of words in the 
incorrect option (vl6--words incorrects) increases, so does the difficulty of 
an inference item. Presumably longer incorrect options increase the number of 
different concepts that have to be compared against the relevant text 
information; thus, this increases the number of processing steps prior to 
making a final decision. 

If the relevant information is in the middle of the passage (v85-- 
information middle of text) or if it is in the middle ^of a particular text 
paragraph (v90- - information middle relevant paragraph), examinees find it more 
difficult to find thi^- information and thereby select the appropriate response 
option . 



23 



The following variables appear to make inference items easier: vl3, 
v21, v70, v79, v82, v83- -negative correct, concreteness , easily found words 
same sentence, information in first sentence, information in opening second 
paragraph, and information in last sentence. If the passage has a concrete 
orientation (v21- -concreteness) , the inference item is easier; presumably the 
ability to visualize a concrete set of text concepts improves the precision 
with which an inference can be drawn. If the inference stem sends the 
examinee to relevant places in the text which are easy to locate (v70- -easily 
found word same sentence) or if the relevant information is at the beginning 
or end of the text (v79, v82, v83 -- information in first sentence, information 
in opening second paragraph, information in last sentence), this contributes 
to making the inference item easier. 

The only difficult finding to explain is that vl3 (negative correct) 
facilitates making a correct inference! Our prior work using SAT rowing 
comprehension items (Freedle & Kostin, 1990) suggested that the presence of a 
negation generally makes an item more difficult. It is possible, of course, 
that by chance the relevant text assertions may themselves be stated in the 
negative; if so, it would be easier to confirm a negative text statement if an 
item's correct option were also stated in the negative. However, we have not 
checked inference item instances to verify whether this conjecture is the 
likely explanation. 

Correlational Results for Explicit Statement Items . Eight variables 
contribute to making explicit statement items more difficult by a 
2- tail test: vl2 , vl5 , vl6 , vl9 , v26 , v77, v85, v90- -words correct, reference 
correct, wcrds incorrects , reference incorrects , humanities, unique word 
different paragraph, information middle of text, information middle relevant 
paragraph. As the number of words increase, in the correct (vl2- -words 
correct) and/or incorrect options (vl6- -words incorrects), the potential 
number of different concepts that have to be compared against the text 
increases; thus the item becomes more difficult. As the number of referential 
expressions increases among the correct (vl5- -reference correct) and/or 
incorrect options (vl9- -reference incorrects), there is also an increase in 
the amount of cognitive operations needed to locate the appropriate 
referential expression, and thus an increase in item difficulty (also see 
Clark & Haviland, 1977, for a related finding). If the relevant information 
is embedded in the middle of the text (v85- - information middle of text) or the 
middle of a particular paragraph (v90- - information middle relevant paragraph), 
or is in a different paragraph from the expected one (v77- -unique word 
dif f( rent paragraph) , this also contributes to explicit statement item 
difficulty. Finally, if the passage belongs to the humanities (v26) , it is 
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perhaps more difficult to locate or correctly interpret the relevant text 
information. 

The following; variables make explicit statement items easier: v21, v24 , 
v27, v34, v91- -concreteness , natural science, natural science excerpt, 
compare - alternative , sum early information codes. It is somewhat surprising 
to find that the concreteness of the text (v21- -concreteness) can also 
facilitate the locating of explicit information. Presxamably the ability to 
scan the details of a concrete passage rapidly in order to locate explicit 
information is faster and more accurate than scanning an abstract passage. 
Also, if the passage is a science excerpt (v27 - -natural science excerp-), this 
facilitates explicit item difficulty; however, most science excerpts turn out 
to be concrete passages so the significance of v27 natural science excerpt is 
probably not independent of the significance of v21- -concreteness . Similarly, 
the presence of natural science content (v24- -natural science) facilitates 
correctness; but again, most of the natural science passages are concrete and 
are also primarily science excerpts. 

If the passage has a top-level compare -alternative structure 
(v34- -compare-alternative) , this aids getting the item correct. Finally, if 
the explicit information is in an easily found passage location (e.g., at the 
beginning of the passage, v91--sum early information codes), the explicit item 
is easier. 



ERIC 



25 



General Comparison of Three Reading Item Types . From the above correlational 
results we point out a few of the similarities among the three item types. 
All three item types show vl6 (words incorrects) as significant. Text 
concreteness (v21- -concreteness) contributes similarly to item difficulty for 
explicit and inference items: concrete texts are easier for both item types 
than are abstract texts. All three item types show similar locational 
effects: locating relevant information for an item in the middle of ' the text 
tends to make it more dif f icult- - see v65 (main idea middle text) and v85 
(information middle of text) for inference and explicit statement items; also 
see v90. Also, for all three item types, when the information occurs in the 
first sentence of the passage the items are easier- -see v61 (main idea first 
sentence) and v79 (information in first sentence) for inference and explicit 
statement items. These locational effects are not necessarily unexpected 
because Kintsch's (1974) earlier theory suggests that early text information 
(which generally is easier to access and/or remember) is often the important 
main idea information as well, whereas less important information often tends 
to occur in the middle of a passage and thus is more difficult to access as 
readily . 

However, in spite of the noted similarities above, cognitively it seems 
self-evident that main idea items should generally not be analyzed with other 
reading item types, especially explicit statement items (e.g., "According to 
the passage, x means the following ...."). That is, examining an entire text 
for its over-arching theme cannot be equivalent in all of its cognitive 
processing steps to confirming or disconf irming a particular statement in the 
passage (this being an explicit statement item) . 

In support of this assertion we can note the following differences in 
item features. In our sample of items, main Idea item stems never used a 
negation, whereas inference and explicit items showed a moderate use of 
negation in the stem. Main idea items virtually never employed "fronted" 
structures for the stems, but inference and explicit items showed a strong use 
of fronted structures for the item stem. Main idea stems never sent the 
examinee to a unique word in the text or to a specific topic or phrase in the 
text. However, Inference and explicit item stems often mentioned a particular 
word or phrase to be searched for in the text. Also, main idea and inference 
items showed virtually no use of fronts for correct and incorrect options, 
whereas explicit items showed a moderate use of frontings for correct and 
incorrect options. 
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Contrasts and Similarities Among Three Reading Item Types . The above 
observations strongly suggest that the three item types typically differ in 
several structural features; nonetheless, we have also noted that the search 
process for locating the correct option sometimes yields convergent results 
among the three item types. Earlier studies (Drum et al , 1981; Embretson & 
Weltzel, 1987) unfortunately did not analyse separately each reading item 
type, probably as a consequence of their relatively small item sample sizes. 
Hence we believe that this current result may be the first effort that 
illustrates some similarities and differences among these three reading item 
types . 

Not all of the correlations in Table 1 provide independent information 
concerning item difficulty. By conducting several stepwise regression 
analyses, we will be able to determine the following: 

(a) what is the overall predictability of GRE reading comprehension item 
difficulty; that is, how much of the variance of the difficulty index (equated 
delta) can be accounted for? 

(b) how many of the 10 category variables (listed under hypothesis 1) 
provide independent information for each of the three reading item types --this 
provides us with a clear test of hypothesis 2 (and its corollary). 

Stepwise regression results for each of three reading item types . The 
next set of results examines the outcome of stepwise regression analyses for 
each of the three reading item types. 

Criteria for admitting variables Into the stepwise regressions . For all 
stepwise regressions, the following criteria were used for admitting variables 
Into the regression. All relevant variables were available for possible 
selection. Each new variable that was admitted Into the solution had to 
yield a significant Individual F value, and, the new F values for all 
previously admitted variables had to be significant. If the next variable 
admitted showed a nonsignificant F, the previous solution was considered the 
final one. 

Regarding the critical ratio, experts seem to differ. Some recommend, 
for example, for a 90 item sample, that no more than nine variables be 
extracted, providing each new variable yields a significant F value and all 
previous variables still retain a significant F value. Other experts (Cohen & 
Cohen, 1983) suggest that no more than three variables be extracted from a 
sample of 90 items. Yet others suggest that it is not the ratio of Items to 
number of extracted variables that Is so critical; of more importance is the 
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difference between the number of items and the number of predictor variables 
(C. Lewis, 1991, personal communication). 

Because many of the variables selected for analysis were already 
reported in the literature as significantly related to reading comprehension 
(e.g., the 10 categories listed under hypothesis 1), we feel the less 
restrictive ratio is more appropriate, especially if we wish to provide a fair 
test of hjrpothesis 2 (which assesses whether many categories of variables 
simultaneously provide independent variance regarding item difficulty) . 
However, for completeness, we also will indicate for each analysis below which 
variables would have been deleted had the more restrictive ratio (1 out of 30) 
been used. In addition (see notes 4, 5, 6) we also report how the 
regression results would be altered had only the variables that were 
significantly correlated wich delta been used in the stepwise procedure (this 
being one way to restrict the number of predictor variables). 

Overall Predictability of Item Difficulty: Evaluation of Hypothesis 2 

Stepwise regression analysis of all main idea items . As Table 2 
demonstrates, three significant variables (v58, v44 , v46- -frequency special 
reference, text sentence words, first paragraph sentence length) account for 
20% of the variance of main idea difficulty. (The more restricted selection 
ratio of 1 out of 30 actually results in the same three predictors being 
extracted as reported in Table 2.) Thus, it appears, without additional 
analyses, that only 20% of the main idea item difficulty index (equated delta) 
can be accounted for; while significant (p <.01) this does not compare 
favorably with the 58% of the main idea variance accounted for when SAT 
reading comprehension items were examined (see Freedle & Kostin, 1991). 
Momentarily we conduct subanalyses that attempt to explore the source of these 
differences across the SAT and GRE main idea data. 

Regarding hypothesis 2, the three extracted variables for the 7 5 main 
idea items suggest that at least two categories listed under hypothesis 1 
contribute independent information concerning item difficulty: v58 (frequency 
special reference) representing the referential category and v44 (text 
sentence words) plus v46 (first paragraph sentence length), both representing 
the category of sentence length . (We shall have to await the regression 
analyses of the remaining two reading item types before we can fully evaluate 
the degree to which hypothesis 2 holds.) 

Another issue should be pursued before we examine the other two reading 
item types. Because of the somewhat low predictability for all GRE main idea 
items, we now explore whether part of the low predictability might be due to 
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Table 2 



Stepwise Regression Analysis for Predicting 76 GRE Main Idea Item 

Difficulty (Equated Delta) 



Variable 



F value 
of each 
Predictor 



a.b 



All passages : (n-=76 items) 
v58 Frequency special 

reference 8.8 
v44 Text sentence words 9.2 
v46 First paragraph sentence 

length 3.9 



Percent 
Variance 



9% 
6% 

4% 



Source 



text 
text 

text 



Long passages : (n=38 items) 
v41 Longest paragraph words 
v65 Main idea middle text 
v43 Longest paragraph 

sentences 
v55 Text questions 
v51 Frequency fronted text 

clauses 



15.5 
5.4 



4.7 
4.2 



18% text 

9% text by item 

8% text 

9% text 

6% text 



Short passages : (n=38 items) 

v36 Coherence 9 . 8 

v52 Frequency combinations of 5.0 

fronted text structures 
vl6 Words incorrects 4.0 



23% 
11% 

7% 



text 
text 

item 



For all main idea items the overall F(3,72) ■= 5.9, p < .01. 
Multiple R for main idea:= .444, R Squared " .20. 
Main idea items for long passages only: overall F(5,32)=6.9, 
p < .01, Multiple R= .72, R squared - .52. 

Main idea items for short passages only: overall F( 3 , 34)'=7 . 7 , 
p- < .01, Multiple R^ .64, R squared = .41. 

b 

All F values for the individual variables are significant at 
p < .05 or beyond. These values are taken from the best stepwi 
regression solution. 
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the fact that we combined both short and long passages in the same analysis, 
possibly masking predictability by combining potentially different search 
strategies as a function of passage length. GRE passages are very clearly 
divided into short and long passages. If a passage is less than 200 words, it 
is classified as short; if longer than 400 words, it is long. It is clear 
that a substantial gap in length occurs between long and short GRE passages 
such that no passage is between 200 and 400 words in length. 

The reader Is reminded that Kieras' (1985) original study of identifying 
where the main idea of a passage is was based on single paragraphs, not 
multiple paragraphs. It occurred to us that perhaps by combining multiple 
paragraph passages with single paragraphs (these latter are virtually all 
defined by having fewer than 200 words in our GRE sample) this could be 
the source of low predictability for the full sample of main idea items. 

Thus a separate regression for main ideas was conducted on just the 
short and just the long passages. Table 2 also shows the results of these 
additional stepwise regressions. The amount of delta variance accounted for 
appears to be substantially improved. Fifty- two percent of the variance of 
items associated with the long passages can now be accounted for as opposed to 
20% for the full (long and short) main idea sample. For the short 
passages, 41% of the variance for main idea deltas now can be accounted for. 
But, interestingly, it is the long passages that contain a significant Kieras - 
type result (i.e., middle-of -passage difficulty effect, v55 [main idea middle 
text]), not the short passages. Perhaps the very brevity of the short 
passages diminishes the impact of a middle-of -passage effect. Subdividing by 
passage length clearly seems to have improved main idea predictability 
although the small sample sizes means that we cannot place too much confidence 
in this result. 

We should note that using the more restricted selection ratio of 1 out 
of 30 would alter our analyses of the long and short passages for main idea 
items. One variable would be allowed for each subanalysis. For long passages 
this would result in 18% variance being accounted for by v41 (longest 
paragraph words --this is the same variable that emerged as one of the 
significant independent predictors of our SAT main idea sample, see Freedle & 
Kostin, 1991). For the short passages, reporting just the best predictor 
variable (v35) would account for 23% of the variance. Again this variable 
(v36- -coherence) was one of the significant independent predictors of our SAT 
main idea sample (Freedle & Kostin, 1990). 

A comparison of significant differences for the correlational results of 
long and short main idea subsamples (see Appendix) indicates that there is 
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further empirical evidence justifying a separate analysis by length of 
passage. For example, there were the following significant differences 
between long and short passages in the correlations (McNemar, 1956, p < .05, 
2-tail for all comparisons) of the following variables with delta: 
v36 (coherence), v37 (paragraphs), vl2 (words correct), vl6 (words 
incorrects) , v41 (longest paragraph words), and v45 (text paragraphs words). 
Of course it is not surprising to find v37 , v41 and v45 (paragraphs, longest 
paragraph words, and text paragraphs words) showing a significant difference 
as a function of passage length; what is more interesting is the difference 
for v36 , vl2 and vl6 (coherence, words correct, words incorrects). Coherence 
(v36) affects short passage item difficulty more than long passage item 
difficulty. (Mean coherence for long and short passages incidentally does not 
differ significantly: for short passages mean coherence - 1.74 [SD-1.13]; for 
long passages ■"1.54 [SD=1.13].) We do not have a clear explanation for why 
coherence significantly affects main idea difficulty for short passages but 
not for long. Also, the difference between long and short passages concerning 
the effect of variables vl2 (words correct) and vl6 (words incorrects) , is not 
obvious. For short passages, as the options become longer (i.e., use more 
words) the item becomes more difficult. And for long passages, as the options 
become longer the item becomes easier. We do not have a clear explanation for 
this . 

Nevertheless, these results, especially for the long and short passages, 

must be counted as exploratory inasmuch as we have too few main idea items and 

too many predictor variables to guarantee a stable result. Later in this 

report, we show that in spite of these limitations, there is evidence that 

main idea reading items from the GRE overlap considerably with SAT main 

idea reading items regarding which variables correlate significantly with 

delta; however, close inspection of the findings suggests that it is the 

longer GRE passages that primarily account for this similarity across data 

sets. Such replication does add weigh, however to the current set of 
4 

findings . 

Stepwise Regression Results for Predicting Inference Item Difficulty 

In Table 3 we see that seven variables yield independent information 
concerning inference item difficulty. Jointly they account for 49% of 
the item difficulty variance. The following contribute to evaluating the 
overall predictability o£ inference item difficulty with respect to the 
categories of hypothesis 1: Concreteness (v21) and location (v90, v79 , and 
v82 -- information middle relevant paragraph, information in first sentence. 
Information in opening second paragraph) . So inferences contribute two 
additional significant categories favoring hypothesis 2. Incidentally, using 
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Table 3 

Stepwise Regression Analysis for Predicting GRE Inference Item 

Difficulty 



F value 
of each 



Variable 



a,b 



All passages : (n-87 items) 

v90 Information middle 
relevant paragraph 



22.0 



Percent 



predictor Variance Source 



15% 



text by item 



v79 Information in first 15.2 

sentence 
v82 Information in opening 

second paragraph 9.9 
c 

v7 Negative stem 6.2 
vl6 Words incorrects 7 . 2 



13% 



5% 

4% 
4% 



text by item 



text by item 

item 
item 



text by item 



v21 Concreteness 5.6 3% text 

v86 Information from two 

paragraphs 4.6 3% 

a 

Overall F(7,79)- 10.8, p < .01. 
Multiple R - .70, R Squared - .49. 

b 

All F values for individual variables are significant at p < .05 or 
beyond, 
c 

This result for v70 is the only place in our regression analyses wher 
the direction of the relationship is opposite to the expected. 
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the more restricted selection ratio of 1 out of 30 would mean that only the 
first three variables reported in Table 3 would have been counted as 
significant, accounting for 33% of the variance? 

Stepv;ise Ref "ession Results for Predicting Explicit Statement Item Difficulty 

Table 4 shows that seven independent predictors account for 41% of the 

item difficulty variance. Of these, the following variables relate to 

hypothesis 2: v34 (compare-alternative) the rhetorical organizer . v90 

(information middle relevant paragraph- - this is a location variable), vl4 

( fronting correct) and vl5 ( reference correct) . Of these four categories two 

ire new to our list of those that favor hypothesis 2 (rhetorical organizer and 

fronting). Finally, using the more restricted selection ratio of 1 out of 30, 

only the first three variables in Table 4 would have been counted as 

6 

significant, accounting for 24% of the variance. 

Conclusion regarding evidence favoring hypothesis 2 . Hypothesis 2 deals 
with the overall predictability of the three item types with respect to each 
of the 10 categories listed under hypothesis 1. Six were found to provide 
independent information when pooling across the three reading item types: 
referentials . sentence length , concreteness . rhetorical organization . 
f rontings and location. Therefore, we conclude that there is moderate 
evidence favoring hypothesis 2. Thus many of the variables reported in the 
experimental literature as individually implicated in influencing 
comprehension difficulty are also found here to contribute jointly to 
determining comprehension difficulty distributed c . r several different 
reading item types. The corollary of hypothesis 2 (that naturalistic texts 
will also exhibit a similar set of jointly significant categories) also then 
receives moderate support. 

A Comparative Analysis Using SAT Main Idea Reading Items: Further Evaluations 
o " Hypotht'SFis 1 and 2 . 

An examination of the SAT reading data (Freedle & Kostin, 1991) suggests 
considerable agreement with hypotheses 1 and 2 that we have just examined for 
the GRE data. In particular, the following categories supported hypothesis 1 
for the SAT data (see column 1 of Table 5): for SAT main idea items the 
correlations showed negation s . referentials . rhetorical organizers , f rontings . 
paragraph length , location , and abstractness were significant. For all three 
SAT reading item types combined, the following categories were correlationally 
significant for hypothesis 1: negations . referentials . rhetorical organizers , 
paragrap h length , location , and abstractness . All but one (negations) of 
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Table 4 



Stepwise Regression Analysis for Predicting GRE Explicit Statements 

Item Difficulty 

Percent 

Variable F value Variance Source 



a,b 

All explicits : (n-81 items) 

v27 Natural science excerpt 
v77 Unique word different 

paragraph 
v34 Compare -alternative 
v90 Information middle 

relevant paragraph 
vl4 Fronting correct 
v24 Natural science 
vl5 Reference correct 



11.2 9% text 

10.5 8% text by item 

9.8 7% text 

9.1 5% text by item 

3.9 5% item 
5.8 3% text 
5.1 4% item 



Overall F - F(7,73)- 7.4, p < .01. 
Multiple R - .64, R Squared - .41. 



b 

The F values for individual variables are all significant at 
p <.05 or beyond. 



34 



these categories were also found for the GRE. Hence there is a significant 
Overlap in the results of both the SAT and GRE data sets. 

Regarding hypothesis 2, the SAT data supported a maximum of five 
independently contributing categories as jointly influencing the regression 
analysis predicting main idea item difficulty. This should be compared with 
the twn categories (length and reference) which jointly supported the total 
main idea GRE sample. [If we scan which categories were significant for 
either the long or short passages for the GRE main idea regression analyses, 
then the number of independent categories is three (sentence length, location, 
frontings) rather than two.] Overall, it appears that hypothesis 2 is 
modestly supported by both the GRE and SAT main idea data. 

Detailed Comparisons of SAT Reading Items and GRE Reading Items for Main Idea 
Item Difficulty: Correlational Comparisons 

The main purpose of this section is to see which significant 
variables reported by Freedle & Kostin (1991) for the SAT main idea reading 
items replicate for the GRE main idea reading items. (Main idea items were 
scored using an identical set rf variables for both the SAT and GRE reading 
items . ) 

We see in Table 5 that 22 SAT main idea variables were found to be 
significant (p <.05, 2-tail or better). If we look at the column to the 
extreme right (the total GRE main Idea sample), we see that 10 of these 
individual variables were replica t ed as significant for the GRE total main 
idea sample: v29, v36, v41, v53, v55, v57 , v59, v60 , v65, v69 (argument, 
coherence, longest paragraph words, frequency of text clefts, text questions, 
reference across text clauses, reference sums, text negatives, main idea 
middle text, all early main idea locations). Of these the following 6 were 
predicted by hypothesis 1 to be significant: v41 (longest paragraph words), 
for paragraph length: v29 (argument) which implicates the Grimes rhetorical 
organizers, especially compare -adversative ; v53 ( frontings . here, text 
clefts) ; v57 (reference across text clauses) and v59 (reference sums) for the 
referential category; v60 (text negatives) for the ne gation category; v65 
(main idea middle text) and v69 (all early main idea locations) for the 
location category. 

Another way to show the similarities between SAT and GRE main ideas is 
to correlate the correlations themselves. The first two columns correlate .66 
(p < .002, 2-tail), while the first and third columns correlate only .43 
(p < .05, 2-tail), showing that the relationships associated with the longer 
GRE passages correlate better with the SAT than do the short GRE passages. 
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Table 5 

Significant SAT Correlations of Item Difficulty with the Set 
Variables as Compared with GRE Correlations: Main Idea Items Only 

Main Idea GRE Main- Idea Data 

Significant Main Idea Main Idea All Main Ideas 

Variables Long Passages Short Passages (long & short) 

for SAT for GRE for GRE for GRE 



Var, 


(n.=110) 


(n-38) 


rn-38) 


(n-76) 


a 


b 








vll 


. 19** 


- .07 


-.06 


- .07 


vl3 


. 26*** 


- .11 


- .08 


- .10 


v21 


- . 54*** 


- .02 


-.24 


- .11 


v24 


- . 32*** 


- .07 


- .16 


- .10 


v27 


- 37*** 


- .07 


- .05 


- .05 


v28 


. 20** 




.06 


.03 


v29 


. 39*** 


.31* 


.11 


.21++ 


v30 


- . 28*** 


.05 


- .17 


- .03 


v33 


. 38*** 


. 32** 


- .03 


.12 


v36 


- . 24*** 


- .04 


- . 48*** 


- . 27** 


v40 


. 22*** 


.10 


-.22 


- .07 


v41 


. 27*** 


. 43*** 


- .24 


.18+ 


v42 


. 20** 


.12 


- .22 


- .08 


v43 


. 28*** 


.12 


- .25 


- .04 


v45 


. 19** 


.35** 


- .25 


.04 


v53 


. 20** 


.15 


.14 


.18+ 


v55 


. 28*** 


.22 


.09 


.19++ 


v57 


. 25*** 


.10 


.26+ 


.18+ 


v59 


. 24*** 


.22 


.22 


.23** 


v60 


. 35*** 


. 32** 


.05 


. 23** 


v65 


. 22** 


. 39** 


.02 


. 27** 


v69 


- . 25*** 


- . 28++ 


- .08 


- . 20** 


a 

Key 


for interpreting 


significant variables: 




vll 


(Answer position) 








vl3 


(Negative, correct) 






v21 


(Concreteness) 








v24 


(Natural science) 








v27 


(Natural science excerpt) 






v28 


(About natural science) 
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v29 (Argument) 

v30 (List/Describe) 

v33 (Compare-adversative) 

v36 (Coherence) 

v40 (First paragraph words) 

v41 (Longest paragraph words) 

v42 (First paragraph sentences) 

v43 (Longest paragraph sentences) 

v45 (Text paragraphs, words) 

v53 (Frequency of text clefts) 

v55 (Text questions) 

v57 (Reference across text clauses) 

v59 (Reference stuns) 

v60 (Text negatives) 

v65 (Main idea middle text) 

v69 (All early main idea locations) 

b 

*** Correlation significant, p <.01, 2-tail 

** Correlation significant, p < .05, 2-tail 

* Correlation marginally significant, p < .06, 2-tail 

++ Correlation signif . , p < .05, 1-tail (justified if we use 

earlier findings of SAT to predict sign of ORE correlation) . 
+ Correlation is marginally significant, p <.06, 1-tailed. 
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Interestingly, the corj. elation of the SAT with all GRE main ideas is also 
significant r-.65 (p < .002, 2-tail), but this fails to capture the fact that 
the magnitudes of the largest correlations in the last column are somewhat 
less impressive than those found in the second column. 

There is yet a third way to try to determine the similarity of the two 
data sets presented in Table 5. By comparing just the algebraic sign of the 
correlations for long passage GRE (column 2) with SAT (column 1), we find 
that 18 of the 21 values have the same algebraic sign; by a sign test this is 
significant (p - .002, 2-tail). However, for the short passage GRE (column 3) 
compared with SAT, only 14 out of 22 are in same direction; this is not 
significant (p > .20). For the full main idea GRE sample compared with the 
SAT main idea sample, 17 out of 22 are in the same direction, which again is 
significant (p -.016, 2-tail, sign test), this significance is due primarily 
to the contribution of the long GRE passages. 
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Conclusion 

In this study we have been primarily interested in determining how well 
reading item difficulty can be accounted for by a set of predictors that 
reflect the contribution of the text structure, the item structure and the 
joint effect ot both the text and items. We found that a substantial amount 
of the variance can be accounted for by a relatively small set of predictors; 
the range of variance accounted for varied from 20% up to 52% depending upon 
the particular analysis undertaken. To our knowledge this is one of the few 
studies to examine the predictability of a relatively large sample of 
multiple-choice reading items (n-244) using a wide selection of predictor 
variables . 

Within this broader concern we have also focused upon a small set of 
hypotheses so as to more clearly come to terms with a number of claims that 
have been made in the scholarly literature concerning reading comprehension 
and the adequacy of reading comprehension tests per se. In particular, 
Goodman (1982) has complained that many of t?ie experimental studies of 
comprehension have focused on just one or two variables at a time; he 
questions whether these separate studies taken together necessar'ly build up 
our understanding of how full comprehension of text takes place. A related 
concern has questioned whether the often highly artificial texts studied in 
the experimental literature will necessarily clarify how more naturalistic 
texts are comprehended. Finally, Royer (1990) and Katz et al (1990) have 
questioned whether multiple-choice reading tests can be considered appropriate 
tests of passage comprehension in light of the fact that item content alone 
(in the absence of the reading passage) can be demonstrated to lead to correct 
answers well above chance levels of guessing. 

In response to these several concerns, the prediction of reading item 
difficulty has been framed around two hypotheses meant to put into clearer 
perspective the viability of multiple-choice reading comprehension tests, here 
exemplified by the GRE reading passages and their associated items. Since 
many of the scored variables deal with text content similar to those of 
concern in the experimental literature and since the GRE reading passages are 
adaptations of prose from naturalistic sources (book passages, magazines, 
etc.), we reasoned that the successful prediction of reading item difficulty 
would allow us to draw several important conclusions. 

The first hypothesis asserts that multiple-choice items will be 
sensitive to a similar set of variables that have been found to be important 
in studying comprehension processes in the experimental literature. The 
evidence generally was interpreted to support most of the categories detailed 
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under hypothesis 1 for the text and text related variables. This was 
interpreted to mean that multiple-choice response formats yield similar 
results to those found in the more controlled experimental studies. Hence we 
feel Royer's (1990) statement that multiple -choice tests do not measure 
passage comprehension can be called into question. 

A second hypothesis asserts that many of the significant variables will 
be found to j ointly influence reading item difficulty. By pooling the 
stepwise regression results across the three reading item types, we concluded 
thai there was considerable evidence that many of the different categories of 
varis studied do jointly account for reading item difficulty. This result 

was larther interpreted as a response to Goodman's (1982) concern that since 
many of the experimental studies involve just one or two variables at a time, 
this may not be sufficient to guarantee that these variables when jointly 
studied will provide any cumulative new information about reading 
comprehension difficulty. Our results appear to suggest that in fact many of 
the different categories of variables do provide independent predictive 
information; hence the few variables studied across disparate studies in fact 
jointly combine so as to increase our understanding of what influences 
comprehension difficulty. A related set of analyses using a large number 
(n=785) of SAT reading items (Freedle & Kostin, 199i; further confirmed the 
viability of this demonstration. 

The fact that the GRE passages were selected from naturalistically 
occurring passages was further interpreted as evidence that the predictive 
success of many of the variables found here to predict the difficulty of items 
associated with these more naturalistic passages are similar to those 
variables found to predict the difficulty of artificially constructed passages 
(as is true of many passages in the experimental literature) . Thus there do 
not seem to be any large differences between studies using naturalistic 
versus artificially constructed passages in terms of their adequacy to study 
the factors that influence comprehension difficulty. A similar result was 
obtained with our analyses of SAT data (see Freedle & Kostin, 1991) ; because 
the SAT passages are also developed from naturalistically occurring prose 
passages, this again indicates that the distinction between artificially 
constructed passages and naturalistic ones is not that great in terms of 
assessing factors that influence reading comprehension. 

In short, we find considerable evidence that multiple -choice tests of 
reading comprehension yield results that are quite consistent with those 
obtained from controlled experimental studies dealing with language 
comprehension. More importantly, because of the relatively large size of our 
data base, the results also provide evidence that many variables affecting 
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comprehension can be shown to contribute independent predictive information in 
determining reading item difficulty. A significant amount of the item 
difficulty variance has been accounted for by a relatively small number of 
variables for each of three reading item types. Finally, we find that the 
current results demonstrate considerable consistency across both the SAT and 
GRE data sets. 

Future work should focus on augmenting, if possible, the amount of 
variance that can be accounted for for each reading item type. We believe the 
current work represents a significant demonstration of lic-j to conceptualize 
the nature of the prediction of reading item difficulty. 
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Notes 

1 

While the Drxim et al . (1981) study was innovative in analyzing the multiple- 
choice testing process into its constituent parts (i.e., determining the 
relative contributions of the item's stem, the item's correct and incorrect 
options, and the text variables to item difficulty), some of the study's 
analyses appeared to be flawed. Ten predictor variables were extracted from 
very small reading item samples (varying between 20 and 36 items) taken from 
seven chil-.;en's reading tests. At most one or two predictors instead of 10 
should have been extracted from such small samples (see Cohen & Cohen, 1983); 
hence 70% of the item difficulty variance is probably too large an estimate of 
the variance actually accounted for. 

2 

The possibility of a curvilinear relationship between v92 (item difficulty) 
and each of the predictor variables was examined; there was little evidence to 
suggest the ex5.stence of any strong curvilinear relationships in the current 
set of data. 



3 

One might postulate that the reason that items dealing with middle text 
information are generally more difficult than those with early text 
information is that there is more material to be remembered and processed that 
might be relevant to some particular test item. However, such a 
straightforward explanation would not account for the fact that test items 
that deal with information found in the last text sentence are often of only 
moderate difficulty; that is, if having to cover more text material is the 
source of locational difficulty, then items dealing with the final text 
sentence should be the most difficult items of all. But they are not the most 
difficult; in fact as we see for inferences (in Table 1) relevant information 
in the last sentence actually makes such items easier. Kintsch and van Dijk 
(1978) provide a memory mechanism that does account for the observed facts: it 
says that the reader actively processes a limited number (about four) clauses 
at a time; these include the most recent clause along with clauses that were 
judged to be of importanCv-> . Thus since the final sentence would be the most 
recent clause that the reader encounters upon finishing his or her reading, it 
would be one that is in active memory and so should a reading item deal with 
such information it should be relatively easy to get such an item correct. 



4 

Another stepwise reression was run for the main idea items using only those 
variables that significantly correlated with main idea item difficulty 
(delta). For the total main idea items v58 (frequency special reference) 
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v44 (text sentence words) represented the final solution (these are identical 
to the first two variables in Table 2 that emerged when all predictor 
variables were allowed to enter into the solution). V65 (frequency special 
reference) and v44 (text sentence words) account for 15% of the variance with 
F(2,73) - 6.6, p < .01. Clearly this result is similar to the 20% accounted 
for in Table 2. For long passages, when we allowed only significantly 
correlated variables into the final solution we can account for 28% of the 
variance, F(2,35) - 6.7, p <.01; again the same two predictor variables 
v41 (longest paragraph words) and v65 (main idea middle text) as reported in 
Table 2 were extracted. For the short passages, the best predictors of the 
main idea items were identical as those reported in Table 2 [v36 (coherence), 
v52 (frequency combinations of fronted text structures), vl6 (words, 
incorrects]; the F value was identical. Clearly, this alternative method for 
restricting the total number of variables that are allowed into the regression 
solution yields a result very similar to the ones reported in Table 2 wherein 
the full set of predictor variables were allowed. Also because of the 
similarity of the final solutions one can deduce that there are few if any 
suppressor variables present in our data. 

5 

Another stepwise regression was run for the inference items using only those 
variables that significantly correlated with inference item difficulty 
(delta). The best seven variables were v90 (information middle relevant 
paragraph), v79 (information in first sentence), v82 (information in opening 
second paragraph), v7 (negative stem), vl6 (words, incorrects), v21 
(concreteness) and vl3 (negative correct) which together account for '+9% of 
the variance, F(7,79) = 10.7, p < .01. This agrees very well with the 
variables that were found in Table 3 wherein all predictor variables were 
allowed into the regression. This again shows that by restricting the allowed 
variables to just those that yield significant correlations with the dependent 
variable (delta) that the results are essentially identical to the earlier 
method . 

6 

Another stepwise regression was run for the explicit statement items using 
only those variables that significantly correlated with explicit statement 
item difficulty (delta) . Here the final solution was identical to that 
reported in Table 4. Again this shows that one way to restrict the number of 
variables initially allowed in the regression solution is to sample just the 
significantly correlated variables. All three reading item types showed this 
close agreement between the the two approaches to stepwise regressions. 
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Appendix 

The Correlation between GRE Item Difficulty (delta) and Each Predictor 



Variable for Each of Three Reading Item Types 



(only 2 -tail tests reported) 
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Table 6 

Correlation between GRE Item Difficulty 
and Each Predictor Variable 



Long Short 
Passages Passages 





All 


All 


All 


Main 


Main 




Inferences 


Explicits 


Main Ideas 


Ideas 


Ideas 


Variable 


(n=87-) 


(n=81) 


(n=76> 


(n=38^ 


(n=38) 


v4 


- .19 


.04 


.10 


.00 


.19 


v5 


- .15 


-.04 




- - 




v6 


- .06 


-.19 


- .01 


.00 


.02 


v7 


- .20* 


- .06 








- - 


v8 


.02 


- .15 


.00 


- - 


.04 


v9 


.06 


.15 


- .02 


- .06 


- .02 


vlO 


- .06 


.14 


- - 


- _ 


_ _ 


vll 


.16 


.09 


- .07 


- .07 


- .06 


vl2 


.11 


. 29*** 


.06 


- .18 


.26 


vl3 


. . 28*** 


.03 


- .10 


- .11 


- .08 


vl4 


. 25** 


.19 


.16 


.04 


.27 


vl5 


- .11 


. 23** 


- .04 


- .16 


.06 


vl6 


. 21** 


. 28*** 


.22* 


- .04 


. 41** 


vl7 


- .10 


.01 


- .02 


.12 


- .15 


vl8 


.06 


.16 


.12 


.04 


.28 


vl9 


- .06 


. 30*** 


.09 


- .10 


.27 


v20 


- .14 


.02 


.09 


.11 


.06 


v21 


- . 21** 


- .27** 


- .11 


- .02 


- .24 


v22 


- .10 


- .Oj 


- .01 


-.05 


.00 


v23 


.15 


- .25** 


- .10 


- .02 


-.19 


v24 


.04 


- .26** 


- .10 


- .07 


- .16 


v25 


- .14 


.07 


.07 


.01 


.14 


v26 


.09 


. 23** 


.05 


.07 


.05 


v27 


.10 


- . 30*** 


- .05 


- .07 


- .05 


v28 


.04 


- .06 


.03 




.06 


v29 


- . 14 


.13 


.21 


.31* 


.11 


v30 


- .03 


.16 


- .03 


.05 


-.17 


v31 


.01 


.06 


.05 


- .07 


.15 


v32 


- .06 


- .11 


.04 


.11 


.03 


v33 


.03 


.10 


12 


. 32** 


- .03 


v34 


- .10 


- . 22** 


- .05 


- .19 


.06 


v35 


.10 


- . 18** 


- .07 


- .09 


- .05 
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Table 6 (Continued) 



v36 


-.12 


.07 


v37 


- .09 


- .07 


v38 


- .04 


- .03 


v39 


- .02 


- .09 


v40 


.11 


.02 


v41 


.03 


.04 


v42 


.13 


- .06 


v43 


.11 


- .07 


v44 


- .10 


.12 


v45 


.07 


.05 


v46 


- .04 


.18 


v47 


- .11 


.14 


v48 


.02 


.07 


v49 


- .10 


.00 


v50 


- .04 


- .01 


v51 


- .09 


- .08 


v52 


- .06 


- .10 


v53 


.05 


- .06 


v54 


- .05 


- .16 


v55 


.04 


.14 


v56 


- .03 


.12 


v57 


.04 


- .06 


v58 


.11 


.02 


v59 


.05 


.03 


v60 


,03 


.11 


v61 


NA 


NA 


v62 


NA 


NA 


v63 


NA 


NA 


v64 


NA 


NA 


v65 


NA 


NA 


v66 


NA 


NA 


v67 


NA 


NA 


v68 


NA 


NA 


v69 


NA 


NA 


v70 


- . 23** 


- .18 


v71 


.11 


-.12 


v72 


.08 


- .12 


v73 


.11 


- .21* 


v74 


- .11 


- .11 


v75 


- .03 


- .08 


v76 


.10 





-.27** -.04 -.48*** 



06 


--.-27..- 


.25. - 


13 


.17 


.13 


06 


- .21 


- .14 


07 


.10 


- .22 


18 


. 43*** 


- .24 


08 


.12 


- .22 


04 


.12 


- .25 


21 


.27 


.17 


04 


. 35** 


- .25 


06 


- .17 


.02 


20 


.40** 


.05 


10 


.10 


.08 


14 


.05 


.17 


29*** 


.40** 


.20 


25** 


. 38** 


.16 


25** 


.22 


.25 


18 


.15 


.14 


.22* 


.24 


.12 


.19 


.22 


.09 


.06 


.07 


- .18 


.18 


.10 


.26 


. 31*** 


.30 


.30 


. 23** 


.22 


.22 


. 23** 


. 32** 


.05 


.20 


- .26 


- .11 


.16 


- .23 


-.08 


.03 


- .10 


.05 


.15 


- .23 


- .11 


. 27** 


. 39** 


.02 


.00 


- .10 


.10 


.01 


.14 


- .10 


.04 


.11 


.01 


.20 


- .28 


- .08 


NA 


NA 


NA 


NA 


NA 


NA 


NA 


NA 


NA 


NA 


NA 


NA 


NA 


NA 


NA 


NA 


NA 


NA 


NA 


NA 


NA 
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Table 6 (Continued) 



v77 


.14 




.26** 


NA 


NA 


Ma 
INft 


v78 


- .04 




.17 


NA 


NA 


NA 
INft 


v79 


- . 34*** 




.21* 


NA 


NA 


MA 
INft 


v80 


.06 




.21* 


NA 


NA 


NA 
INft 


v81 


.14 




.16 


NA 


NA 


MA 
INft 


v82 


- . 28*** 




.04 


NA 


NA 
IN ft 


Ma 
INA 


v83 


- .21** 


- 


.11 


NA 


NA 


NA 


v84 


.05 




.10 


NA 


NA 


NA 


v85 


. 28*** 




. 25** 


NA 


NA 


NA 


v86 


.16 




. 02 


HA 


Ma 
NA 


NA 


v87 


.07 




.18 


NA 


NA 


NA 


v88 


.12 




.11 


NA 


NA 


NA 


v89 


.03 




.20* 


NA 


NA 


NA 


v90 


. 39*** 




. 28*** 


NA 


NA 


NA 


v91 


- .16 




. 29*** 


NA 


NA 


NA 


*** 


significant, p < 


01, 


2-tailed 








** 


significant, p < 


05, 


2-talled 









* marginally significant, p <.06, 2-tailed 
NA - not applicable 
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